The R package SemiPar contains the milan.mort dataset on short-term effect of air pollution on mortality. The data comprise 3,652 observations on 9 variables, whose description can be found in the help file. The data are also analysed in the book by Ruppert, Wand and Carroll (2003). The original reference is:
Vigotti, M.A., Rossi, G., Bisanti, L., Zanobetti, A. and Schwartz, J. (1996). Short term effect of urban air pollution on respiratory health in Milan, Italy, 1980-1989. Journal of Epidemiology and Community Health, 50, 71-75.
First I start from an explorative analysis of the dataset milan.mort, containing data on 3652 consecutive days regarding the number of deaths and pollution levels in Milan, from 1st January 1980 to 30th December 1989.
# data summary
summary(milan.mort)
## day.num day.of.week holiday mean.temp
## Min. : 1.0 Min. :1 Min. :0.00000 Min. :-6.10
## 1st Qu.: 913.8 1st Qu.:2 1st Qu.:0.00000 1st Qu.: 7.10
## Median :1826.5 Median :4 Median :0.00000 Median :13.70
## Mean :1826.5 Mean :4 Mean :0.02793 Mean :14.04
## 3rd Qu.:2739.2 3rd Qu.:6 3rd Qu.:0.00000 3rd Qu.:21.40
## Max. :3652.0 Max. :7 Max. :1.00000 Max. :31.50
## rel.humid tot.mort resp.mort SO2
## Min. : 0.00 Min. :10.00 Min. : 0.000 Min. :-17.57
## 1st Qu.:50.30 1st Qu.:26.00 1st Qu.: 1.000 1st Qu.: 32.25
## Median :62.30 Median :31.00 Median : 2.000 Median : 64.77
## Mean :62.01 Mean :31.86 Mean : 2.202 Mean :114.74
## 3rd Qu.:74.70 3rd Qu.:37.00 3rd Qu.: 3.000 3rd Qu.:157.75
## Max. :99.70 Max. :66.00 Max. :12.000 Max. :827.75
## TSP
## Min. : 3.50
## 1st Qu.: 83.61
## Median :117.09
## Mean :136.49
## 3rd Qu.:170.50
## Max. :529.50
# tot.mort histogram by week day
ggplot(milan.mort, aes(x=tot.mort)) +
geom_bar(aes(fill=factor(day.of.week)), width = 0.5)
day.num is the id for each dayday.of.week is the index for the day of the week (from \(1\) to \(7\))holiday is an indicator for holiday daysmean.temp is the mean temperature in degrees Celsiusrel.humid the relative humiditytot.mort the total number of deathsresp.mort is the number of respiratory deathsSO2 measures sulphur dioxide level in the airTSP is the total number of suspended particlesThe objective is that of building a model for the average number of deaths in Milan by using total.mort as a response variable (or a suitable transformation of it) and a Bayesian approach.
Concentrations of SO2 and TSP are highly seasonal, since they tend to rise during winter.
# pollution
ggplot(data = milan.mort, aes(day.num, SO2)) + geom_point(aes(color=mean.temp))
ggplot(data = milan.mort, aes(day.num, TSP)) + geom_point(aes(color=mean.temp))
Here follow some scatterplots showing the relationships between the covariates and the response variable tot.mort.
pairs(tot.mort ~ mean.temp + rel.humid + SO2 + TSP,
data=milan.mort)
# pollution
ggplot(data = milan.mort, aes(log(SO2+50), tot.mort)) + geom_point() + geom_smooth(method = "lm", se = FALSE)
ggplot(data = milan.mort, aes(log(TSP+30), tot.mort)) + geom_point() + geom_smooth(method = "lm", se = FALSE)
I propose considering an additional response variable, called avg.mort, which is simply the average number of deaths in the last 3 days.
# avg number of deaths
milan.mort <- milan.mort %>%
mutate(avg.mort = rollmean(x = tot.mort, 5, align = "right", fill = NA))
milan.mort <- milan.mort[5:nrow(milan.mort),]
head(milan.mort)
## day.num day.of.week holiday mean.temp rel.humid tot.mort resp.mort
## 5 5 6 0 2.2 71.3 36 1
## 6 6 7 0 0.7 80.7 45 6
## 7 7 1 0 -0.6 82.0 46 2
## 8 8 2 0 -0.5 82.7 38 4
## 9 9 3 0 0.2 79.3 29 1
## 10 10 4 0 1.7 69.3 39 4
## SO2 TSP avg.mort
## 5 354.25 234.59 36.6
## 6 334.50 167.34 36.6
## 7 619.50 216.49 39.4
## 8 560.61 236.10 39.6
## 9 535.09 251.85 38.8
## 10 524.92 269.95 39.4
# motivation
ggplot(milan.mort, aes(x=day.num, y=tot.mort)) +
geom_point()
ggplot(milan.mort, aes(x=day.num, y=avg.mort)) +
geom_point()
This value is computed for each observation (the first two entries in the dataset are just discarded) and allows us to take into account the effect of toxicity levels in the air within 3 days next to the observed value.
Motivated by the above relationships I start with the predictors \(SO2\) and \(TSP\):
\[ \text{avg_mort}_i \sim \mathcal{N}(\mu_i,\sigma^2)\\ \mu_i = \alpha + \beta\; \text{SO2}_i + \gamma \; TSP_i \] which has \(\mathcal{N}(0,30)\) prior distributions on \(\alpha,\beta,\gamma\) and \(Cauchy(0,30)\) on \(\sigma\).
## arrange data into a list
data <- list(
N = nrow(milan.mort),
avg_mort = milan.mort$avg.mort,
SO2 = milan.mort$SO2+50,
TSP = milan.mort$TSP+30,
mean_temp = milan.mort$mean.temp
)
## compile the model
model1 <- stan_model("model1.stan", auto_write = TRUE)
## fit the model
model1_fit <- sampling(model1, data = data, cores=4)
saveRDS(model1_fit, "model1_fit.rds")
# Load fitted model
model1_fit <- readRDS("model1_fit.rds")
print(model1_fit, pars = c('alpha','beta','gamma','sigma'))
## Inference for Stan model: model1.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## alpha 28.17 0.01 0.19 27.78 28.04 28.17 28.29 28.53 1157 1
## beta 0.03 0.00 0.00 0.03 0.03 0.03 0.03 0.03 4235 1
## gamma -0.01 0.00 0.00 -0.01 -0.01 -0.01 -0.01 0.00 1569 1
## sigma 4.72 0.00 0.06 4.61 4.68 4.72 4.76 4.83 1564 1
##
## Samples were drawn using NUTS(diag_e) at Tue Jun 18 09:29:31 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
## traceplots
plot(model1_fit, plotfun = "trace", pars = c('alpha','beta','gamma','sigma')) + ggtitle("traceplots")
## acf plots
mcmc_acf(as.matrix(model1_fit, pars=c("alpha","beta","gamma","sigma")))
# for model comparison
log_lik_1 <- extract_log_lik(model1_fit)
loo_1 <- loo(log_lik_1)
## Warning: Relative effective sample sizes ('r_eff' argument) not specified.
## For models fit with MCMC, the reported PSIS effective sample sizes and
## MCSE estimates will be over-optimistic.
Rhat value equal to one indicates the convergence of the chains, which is confirmed by the traceplots mixing well. Also, the autocorrelation decreases for all parameters.
# relationship bw estimated parameters
mcmc_scatter(as.matrix(model1_fit, pars = c('alpha','beta')), alpha = 0.2)
mcmc_scatter(as.matrix(model1_fit, pars = c('beta','gamma')), alpha = 0.2)
mcmc_scatter(as.matrix(model1_fit, pars = c('alpha','sigma')), alpha = 0.2)
The scatterplot shows a relationship between variables beta and gamma.
# real vs estimated checks
y_rep <- as.matrix(model1_fit, pars = "y_rep")
y <- milan.mort$avg.mort
pp_check_density(y_rep, y=y)
ppc_stat(y = y, yrep = y_rep, stat = "mean")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ppc_stat(y = y, yrep = y_rep, stat = "sd")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
In the posterior predictive check, the simulated distributions resemble the original one.
The residuals look mostly positive, this means that the model understimates the real outcomes.
ggplot(milan.mort, aes(x = mean.temp, y = milan.mort$avg.mort)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE)
The plot shows an evident relationship between the average number of deaths and the mean daily temperature, so I include the information about temperature in the model \[ \text{avg_mort}_i \sim \mathcal{N}(\mu_i,\sigma^2)\\ \mu_i = \alpha + \beta\; \text{SO2}_i + \gamma \; TSP_i + \delta \; \text{mean_temp}_i \] I’m using the same priors from the first model, and a \(\mathcal{N}(0,30)\) on \(\delta\).
## compile the model
model2 <- stan_model("model2.stan", auto_write = TRUE)
## fit the model
model2_fit <- sampling(model2, data = data, cores = 4)
saveRDS(model2_fit, "model2_fit.rds")
# Load fitted model
model2_fit <- readRDS("model2_fit.rds")
print(model2_fit, pars = c('alpha','beta','gamma','delta','sigma'))
## Inference for Stan model: model2.
## 4 chains, each with iter=2000; warmup=1000; thin=1;
## post-warmup draws per chain=1000, total post-warmup draws=4000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## alpha 34.06 0.01 0.32 33.43 33.85 34.07 34.27 34.67 1103 1
## beta 0.02 0.00 0.00 0.02 0.02 0.02 0.02 0.02 2053 1
## gamma -0.01 0.00 0.00 -0.01 -0.01 -0.01 -0.01 -0.01 3155 1
## delta -0.27 0.00 0.01 -0.29 -0.28 -0.27 -0.26 -0.25 946 1
## sigma 4.43 0.00 0.05 4.33 4.39 4.43 4.46 4.54 1629 1
##
## Samples were drawn using NUTS(diag_e) at Tue Jun 18 09:32:32 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
# for model comparison
log_lik_2 <- extract_log_lik(model2_fit)
loo_2 <- loo(log_lik_2)
## Warning: Relative effective sample sizes ('r_eff' argument) not specified.
## For models fit with MCMC, the reported PSIS effective sample sizes and
## MCSE estimates will be over-optimistic.
## pp check
y_rep <- as.matrix(model2_fit, pars = "y_rep")
y <- milan.mort$avg.mort
pp_check_density(y_rep, y)
std_residuals(y_rep, y)
out_perc(y_rep, y)
## [1] 0.01398026
ppc_stat(y = y, yrep = y_rep, stat = "mean")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ppc_stat(y = y, yrep = y_rep, stat = "sd")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
# grouped data
ppc_stat_grouped(
y = y,
yrep = y_rep,
group = milan.mort$day.of.week,
stat = 'mean',
binwidth = 0.2
)
ppc_stat_grouped(
y = y,
yrep = y_rep,
group = milan.mort$day.of.week,
stat = 'sd',
binwidth = 0.2
)
Even if the chains are converging correctly and the autocorrelation is decreasing, the predictions on the mean value are not always plausible over the different days of the week.
We could take into account the hierarchical structure of the data in week days and try modelling the variation across them. In this case the response variable is tot.mort, because it would make no sense to model daily variations on averages calculated along the last three days. Both the intercept and the standard deviation are estimated for each different day of the week, moreover this time the priors are \(\mathcal{N}(0,10)\).
\[ \text{avg_mort}_{d} \sim \mathcal{N}(\mu_{d},\sigma_d)\\ \mu_{d} = \alpha_d + \beta\; \text{SO2} + \gamma \; TSP + \delta \; \text{mean_temp} \]
# reload data
data(milan.mort)
# hierarchical data
hier_data <- list(
N = nrow(milan.mort),
D = length(unique(milan.mort$day.of.week)),
tot_mort = milan.mort$tot.mort,
SO2 = milan.mort$SO2+50,
TSP = milan.mort$TSP+30,
mean_temp = milan.mort$mean.temp,
day_of_week = milan.mort$day.of.week,
holiday = milan.mort$holiday
)
str(hier_data)
## List of 8
## $ N : int 3652
## $ D : int 7
## $ tot_mort : num [1:3652] 45 32 37 33 36 45 46 38 29 39 ...
## $ SO2 : num [1:3652] 317 425 326 490 404 ...
## $ TSP : num [1:3652] 140 183 192 228 265 ...
## $ mean_temp : num [1:3652] 5.6 4.1 4.6 2.9 2.2 0.7 -0.6 -0.5 0.2 1.7 ...
## $ day_of_week: int [1:3652] 2 3 4 5 6 7 1 2 3 4 ...
## $ holiday : int [1:3652] 1 0 0 0 0 0 0 0 0 0 ...
## compile the model
model3 <- stan_model("model3.stan", "auto_write" = TRUE)
## fit the model
model3_fit <- sampling(model3, data = hier_data, iter = 1000, chains = 4, cores=2)
saveRDS(model3_fit, "model3_fit.rds")
# Load fitted model
model3_fit <- readRDS("model3_fit.rds")
print(model3_fit, pars = c('alpha','beta','gamma','delta','sigma'))
## Inference for Stan model: model3.
## 4 chains, each with iter=1000; warmup=500; thin=1;
## post-warmup draws per chain=500, total post-warmup draws=2000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## alpha[1] 32.83 0.02 0.57 31.72 32.44 32.84 33.22 33.90 875 1
## alpha[2] 32.99 0.02 0.57 31.86 32.61 32.99 33.37 34.10 886 1
## alpha[3] 32.41 0.02 0.56 31.34 32.01 32.42 32.80 33.44 1082 1
## alpha[4] 32.69 0.02 0.58 31.56 32.31 32.69 33.08 33.80 777 1
## alpha[5] 32.39 0.02 0.58 31.23 32.00 32.39 32.79 33.53 916 1
## alpha[6] 32.16 0.02 0.55 31.08 31.78 32.17 32.55 33.22 787 1
## alpha[7] 32.17 0.02 0.52 31.15 31.82 32.17 32.53 33.19 868 1
## beta 0.02 0.00 0.00 0.02 0.02 0.02 0.02 0.02 1222 1
## gamma -0.01 0.00 0.00 -0.01 -0.01 -0.01 0.00 0.00 2038 1
## delta -0.23 0.00 0.02 -0.26 -0.24 -0.23 -0.21 -0.19 772 1
## sigma[1] 7.14 0.00 0.22 6.71 6.98 7.14 7.29 7.57 2112 1
## sigma[2] 6.72 0.00 0.21 6.33 6.57 6.71 6.86 7.17 1855 1
## sigma[3] 6.52 0.00 0.20 6.15 6.39 6.52 6.66 6.92 2179 1
## sigma[4] 6.85 0.00 0.21 6.47 6.71 6.84 6.99 7.26 2152 1
## sigma[5] 6.99 0.01 0.22 6.58 6.83 6.98 7.14 7.43 1921 1
## sigma[6] 6.59 0.00 0.20 6.21 6.45 6.59 6.73 7.00 2143 1
## sigma[7] 6.68 0.01 0.21 6.30 6.53 6.67 6.82 7.13 1389 1
##
## Samples were drawn using NUTS(diag_e) at Tue Jun 18 09:37:08 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
#computing psis-looic
log_lik_3 <- extract_log_lik(model3_fit)
loo_3 <- loo(log_lik_3)
## Warning: Relative effective sample sizes ('r_eff' argument) not specified.
## For models fit with MCMC, the reported PSIS effective sample sizes and
## MCSE estimates will be over-optimistic.
# areas plots
mcmc_areas(as.matrix(model3_fit, pars = c('alpha')))
mcmc_areas(as.matrix(model3_fit, pars = c('sigma')))
## traceplots
plot(model3_fit, plotfun = "trace", pars = c('alpha','beta','gamma','sigma')) + ggtitle("traceplots")
## acf plots
mcmc_acf(as.matrix(model3_fit, pars=c("alpha","beta","gamma","sigma")))
## parameters distributions
mcmc_hist(as.matrix(model3_fit, pars = c('alpha','sigma')))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## pp checks
y_rep <- as.matrix(model3_fit, pars = "y_rep")
y <- milan.mort$tot.mort
pp_check_density(y_rep, y)
# real mean vs estimated mean
ppc_stat_grouped(
y = y,
yrep = y_rep,
group = milan.mort$day.of.week,
stat = 'mean',
binwidth = 0.2
)
ppc_stat_grouped(
y = y,
yrep = y_rep,
group = milan.mort$day.of.week,
stat = 'sd',
binwidth = 0.2
)
Now the standardized residuals look and the model is able to correctly describe the average number of deaths among different days of the week.
# add month information
milan.mort <- milan.mort %>%
mutate(date = as.Date("1980-01-01") + 0:(length(milan.mort$day.num)-1)) %>%
mutate(month = month(date))
# hierarchical data
hier_data <- list(
N = nrow(milan.mort),
D = length(unique(milan.mort$month)),
tot_mort = milan.mort$tot.mort,
SO2 = milan.mort$SO2+50,
TSP = milan.mort$TSP+30,
mean_temp = milan.mort$mean.temp,
day_of_week = milan.mort$month,
holiday = milan.mort$holiday
)
str(hier_data)
## List of 8
## $ N : int 3652
## $ D : int 12
## $ tot_mort : num [1:3652] 45 32 37 33 36 45 46 38 29 39 ...
## $ SO2 : num [1:3652] 317 425 326 490 404 ...
## $ TSP : num [1:3652] 140 183 192 228 265 ...
## $ mean_temp : num [1:3652] 5.6 4.1 4.6 2.9 2.2 0.7 -0.6 -0.5 0.2 1.7 ...
## $ day_of_week: num [1:3652] 1 1 1 1 1 1 1 1 1 1 ...
## $ holiday : int [1:3652] 1 0 0 0 0 0 0 0 0 0 ...
## fit the model
model3_fit2 <- sampling(model3, data = hier_data, iter = 1000, chains = 4, cores=4)
saveRDS(model3_fit2, "model3_fit2.rds")
# Load fitted model
model3_fit2 <- readRDS("model3_fit2.rds")
print(model3_fit2, pars = c('alpha','beta','gamma','delta','sigma'))
## Inference for Stan model: model3.
## 4 chains, each with iter=1000; warmup=500; thin=1;
## post-warmup draws per chain=500, total post-warmup draws=2000.
##
## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
## alpha[1] 31.80 0.02 0.65 30.53 31.35 31.80 32.24 33.02 1764 1
## alpha[2] 31.45 0.02 0.67 30.13 31.00 31.45 31.92 32.76 1784 1
## alpha[3] 31.40 0.02 0.61 30.24 30.99 31.41 31.82 32.55 1155 1
## alpha[4] 28.46 0.02 0.62 27.28 28.06 28.46 28.87 29.68 1290 1
## alpha[5] 27.45 0.02 0.73 26.07 26.95 27.47 27.92 28.92 1270 1
## alpha[6] 26.02 0.02 0.86 24.43 25.42 26.04 26.59 27.72 1193 1
## alpha[7] 23.27 0.03 1.01 21.33 22.56 23.26 23.95 25.29 991 1
## alpha[8] 19.54 0.03 0.94 17.76 18.87 19.55 20.19 21.40 1054 1
## alpha[9] 22.90 0.02 0.81 21.37 22.35 22.88 23.45 24.54 1136 1
## alpha[10] 27.82 0.02 0.68 26.45 27.33 27.83 28.28 29.17 1145 1
## alpha[11] 28.68 0.02 0.60 27.54 28.27 28.69 29.10 29.85 1521 1
## alpha[12] 30.01 0.01 0.59 28.85 29.62 30.01 30.42 31.12 1845 1
## beta 0.02 0.00 0.00 0.02 0.02 0.02 0.02 0.02 2511 1
## gamma -0.01 0.00 0.00 -0.01 -0.01 -0.01 -0.01 0.00 2539 1
## delta 0.16 0.00 0.04 0.08 0.13 0.16 0.18 0.23 1158 1
## sigma[1] 6.88 0.01 0.27 6.34 6.69 6.87 7.06 7.41 2325 1
## sigma[2] 8.02 0.01 0.34 7.37 7.80 8.01 8.26 8.71 2869 1
## sigma[3] 6.90 0.01 0.28 6.37 6.71 6.89 7.09 7.46 2764 1
## sigma[4] 5.91 0.00 0.24 5.46 5.74 5.91 6.07 6.42 2528 1
## sigma[5] 5.77 0.00 0.24 5.33 5.60 5.76 5.93 6.25 2604 1
## sigma[6] 5.99 0.01 0.24 5.55 5.82 5.98 6.15 6.48 1330 1
## sigma[7] 7.96 0.01 0.32 7.37 7.74 7.95 8.18 8.61 1963 1
## sigma[8] 6.31 0.01 0.25 5.86 6.14 6.30 6.48 6.84 2039 1
## sigma[9] 5.57 0.01 0.23 5.13 5.41 5.56 5.72 6.06 1987 1
## sigma[10] 6.19 0.01 0.25 5.71 6.02 6.19 6.35 6.71 1936 1
## sigma[11] 6.22 0.00 0.25 5.74 6.04 6.20 6.38 6.76 2693 1
## sigma[12] 6.09 0.00 0.25 5.63 5.92 6.08 6.24 6.61 2693 1
##
## Samples were drawn using NUTS(diag_e) at Tue Jun 18 09:46:42 2019.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at
## convergence, Rhat=1).
#computing psis-looic
log_lik_4 <- extract_log_lik(model3_fit2)
loo_4 <- loo(log_lik_4)
## Warning: Relative effective sample sizes ('r_eff' argument) not specified.
## For models fit with MCMC, the reported PSIS effective sample sizes and
## MCSE estimates will be over-optimistic.
# areas plots
mcmc_areas(as.matrix(model3_fit2, pars = c('alpha')))
mcmc_areas(as.matrix(model3_fit2, pars = c('sigma')))
## traceplots
plot(model3_fit2, plotfun = "trace", pars = c('alpha','beta','gamma','sigma')) + ggtitle("traceplots")
## acf plots
mcmc_acf(as.matrix(model3_fit2, pars=c("alpha","beta","gamma","sigma")))
## parameters distributions
mcmc_hist(as.matrix(model3_fit2, pars = c('alpha','sigma')))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## pp checks
y_rep <- as.matrix(model3_fit2, pars = "y_rep")
y <- milan.mort$tot.mort
pp_check_density(y_rep, y)
std_residuals(y_rep, y)
out_perc(y_rep, y)
## [1] 0.07667032
ppc_stat_grouped(
y = y,
yrep = y_rep,
group = milan.mort$month,
stat = 'mean',
binwidth = 0.2
)
ppc_stat_grouped(
y = y,
yrep = y_rep,
group = milan.mort$month,
stat = 'sd',
binwidth = 0.2
)
Now I consider a GLM with a Poisson distributed response for total.mort, comparing the fitted response values with those obtained previously.
# The canonical link function for Poisson model is the logarithm.
glm_fit <- stan_glm (
formula=tot.mort ~ log(SO2+50) + log(TSP+30) + mean.temp,
data=milan.mort,
family=poisson(),
prior=normal(0,10),
prior_intercept = normal(0,10),
chains = 4, iter=1000)
saveRDS(glm_fit, "glm_fit.rds")
# Load fitted model
glm_fit <- readRDS("glm_fit.rds")
summary(glm_fit)
##
## Model Info:
##
## function: stan_glm
## family: poisson [log]
## formula: tot.mort ~ log(SO2 + 50) + log(TSP + 30) + mean.temp
## algorithm: sampling
## priors: see help('prior_summary')
## sample: 2000 (posterior sample size)
## observations: 3652
## predictors: 4
##
## Estimates:
## mean sd 2.5% 25% 50% 75%
## (Intercept) 2.9 0.0 2.8 2.9 2.9 3.0
## log(SO2 + 50) 0.2 0.0 0.1 0.1 0.2 0.2
## log(TSP + 30) 0.0 0.0 -0.1 0.0 0.0 0.0
## mean.temp 0.0 0.0 0.0 0.0 0.0 0.0
## mean_PPD 31.9 0.1 31.6 31.8 31.9 32.0
## log-posterior -12232.1 1.4 -12235.6 -12232.8 -12231.8 -12231.0
## 97.5%
## (Intercept) 3.0
## log(SO2 + 50) 0.2
## log(TSP + 30) 0.0
## mean.temp 0.0
## mean_PPD 32.1
## log-posterior -12230.2
##
## Diagnostics:
## mcse Rhat n_eff
## (Intercept) 0.0 1.0 1923
## log(SO2 + 50) 0.0 1.0 1442
## log(TSP + 30) 0.0 1.0 2081
## mean.temp 0.0 1.0 1555
## mean_PPD 0.0 1.0 562
## log-posterior 0.1 1.0 573
##
## For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
print(glm_fit$coefficients)
## (Intercept) log(SO2 + 50) log(TSP + 30) mean.temp
## 2.935556556 0.154383082 -0.033641469 -0.004833856
# traceplots
plot(glm_fit, plotfun = "trace") + ggtitle("glm traceplots")
# acf plots
mcmc_acf(as.matrix(glm_fit, pars=c("(Intercept)","log(SO2 + 50)","log(TSP + 30)","mean.temp")))
# final comparison
# avg.mort response
print(loo_1) # no pooling model
##
## Computed from 4000 by 3648 log-likelihood matrix
##
## Estimate SE
## elpd_loo -10839.1 53.6
## p_loo 4.7 0.3
## looic 21678.3 107.2
## ------
## Monte Carlo SE of elpd_loo is 0.0.
##
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
print(loo_2) # no pooling with temperature
##
## Computed from 4000 by 3648 log-likelihood matrix
##
## Estimate SE
## elpd_loo -10607.2 61.9
## p_loo 6.3 0.5
## looic 21214.3 123.7
## ------
## Monte Carlo SE of elpd_loo is 0.0.
##
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
# tot.mort response
print(loo_3) # hierarchical on weekday
##
## Computed from 2000 by 3652 log-likelihood matrix
##
## Estimate SE
## elpd_loo -12178.6 54.5
## p_loo 21.3 1.9
## looic 24357.1 109.0
## ------
## Monte Carlo SE of elpd_loo is 0.1.
##
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
print(loo(glm_fit)) # glm
##
## Computed from 2000 by 3652 log-likelihood matrix
##
## Estimate SE
## elpd_loo -12229.2 71.5
## p_loo 6.2 0.2
## looic 24458.4 142.9
## ------
## Monte Carlo SE of elpd_loo is 0.1.
##
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
print(loo_4) # hierarchical on month
##
## Computed from 2000 by 3652 log-likelihood matrix
##
## Estimate SE
## elpd_loo -11993.0 51.6
## p_loo 31.5 2.0
## looic 23986.0 103.1
## ------
## Monte Carlo SE of elpd_loo is 0.1.
##
## All Pareto k estimates are good (k < 0.5).
## See help('pareto-k-diagnostic') for details.
By comparing the fitted models on the two considered response variables avg.mort and tot.mort, we can notice that the models with lower loo values are the temperature model on the average number of deaths and the hierarchical model on months for the total number of deaths.